Toward Marker-free 3D Pose Estimation in Lifting: A Deep Multi-view Solution

نویسندگان

  • Rahil Mehrizi
  • Xi Peng
  • Zhiqiang Tang
  • Xu Xu
  • Dimitris N. Metaxas
  • Kang Li
چکیده

Lifting is a common manual material handling task performed in the workplaces. It is considered as one of the main risk factors for Work-related Musculoskeletal Disorders. To improve work place safety, it is necessary to assess musculoskeletal and biomechanical risk exposures associated with these tasks, which requires very accurate 3D pose. Existing approaches mainly utilize marker-based sensors to collect 3D information. However, these methods are usually expensive to setup, timeconsuming in process, and sensitive to the surrounding environment. In this study, we propose a multi-view based deep perceptron approach to address aforementioned limitations. Our approach consists of two modules: a ”view-specific perceptron” network extracts rich information independently from the image of view, which includes both 2D shape and hierarchical texture information; while a ”multi-view integration” network synthesizes information from all available views to predict accurate 3D pose. To fully evaluate our approach, we carried out comprehensive experiments to compare different variants of our design. The results prove that our approach achieves comparable performance with former marker-based methods, i.e. an average error of 14.72 ± 2.96 mm on the lifting dataset. The results are also compared with state-of-the-art methods on HumanEvaI dataset [1], which demonstrates the superior performance of our approach. Keywords-markerless 3D human pose estimation; deep neural network; lifting

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تخمین چنددوربینی حالت سه بعدی انسان با برازش افکنش مدل اسکلت سه بعدی مفصل دار در تصاویر سایه نما

Automatic capture and analysis of human motion, based on images or video is important issue in computer vision due to the vast number of applications in animation, surveillance, biomechanics, Human Computer Interaction, entertainment and game industry. In these applications, it is clear that 3D human pose estimation is an essential part. Therefore, its accuracy has a great effect on the perform...

متن کامل

A deep learning approach for pose estimation from volumetric OCT data.

Tracking the pose of instruments is a central problem in image-guided surgery. For microscopic scenarios, optical coherence tomography (OCT) is increasingly used as an imaging modality. OCT is suitable for accurate pose estimation due to its micrometer range resolution and volumetric field of view. However, OCT image processing is challenging due to speckle noise and reflection artifacts in add...

متن کامل

2D-3D Pose Consistency-based Conditional Random Fields for 3D Human Pose Estimation

This study considers the 3D human pose estimation problem in a single RGB image by proposing a conditional random field (CRF) model over 2D poses, in which the 3D pose is obtained as a byproduct of the inference process. The unary term of the proposed CRF model is defined based on a powerful heat-map regression network, which has been proposed for 2D human pose estimation. This study also prese...

متن کامل

Multi-View 3D Pose Estimation from Single Depth Images

In this paper, we investigate the problem of multi-view 3D human pose estimation from depth images using deep learning methods. We utilize an iterative approach that progressively makes changes to an initial mean pose by feeding back error predictions. Our model is evaluated on a newly collected dataset (ITOP) that contains 30K annotated depth images from top-down and frontal views. Experiments...

متن کامل

Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks

This paper proposes a novel system to estimate and track the 3D poses of multiple persons in calibrated RGBDepth camera networks. The multi-view 3D pose of each person is computed by a central node which receives the single-view outcomes from each camera of the network. Each single-view outcome is computed by using a CNN for 2D pose estimation and extending the resulting skeletons to 3D by mean...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.01741  شماره 

صفحات  -

تاریخ انتشار 2018